New & upvoted

Customize feedCustomize feed

Quick takes

Show community
View more
Back in October 2024, I tried to test various LLM Chatbots with the question: "Is there a way to convert a correlation to a probability while preserving the relationship 0 = 1/n?" Years ago, I came up with an unpublished formula that does just that: p(r) = (n^r * (r + 1)) / (2^r * n) So I was curious if they could figure it out. Alas, back in October 2024, they all made up formulas that didn't work. Yesterday, I tried the same question on ChatGPT and, while it didn't get it quite right, it came, very, very close. So, I modified the question to be more specific: "Is there a way to convert a correlation to a probability while preserving the relationships 1 = 1, 0 = 1/n, and -1 = 0?" This time, it came up with a formula that was different and simpler than my own, and... it actually works! I tried this same prompt with a bunch of different LLM Chatbots and got the following: Correct on the first prompt: GPT4o, Claude 3.7 Correct after explaining that I wanted a non-linear, monotonic function: Gemini 2.5 Pro, Grok 3 Failed: DeepSeek-V3, Mistral Le Chat, QwenMax2.5, Llama 4 Took too long thinking and I stopped it: DeepSeek-R1, QwQ All the correct models got some variation of: p(r) = ((r + 1) / 2)^log2(n) This is notably simpler and arguably more elegant than my earlier formula. It also, unlike my old formula, has an easy to derive inverse function. So yeah. AI is now better than me at coming up with original math.
I guess orgs need to be more careful about who they hire as forecasting/evals researchers in light of a recently announced startup. Sometimes things happen, but three people at the same org... This is also a massive burning of the commons. It is valuable for forecasting/evals orgs to be able to hire people with a diversity of viewpoints in order to counter bias. It is valuable for folks to be able to share information freely with folks at such forecasting orgs without having to worry about them going off and doing something like this. However, this only works if those less worried about AI risks who join such a collaboration don't use the knowledge they gain to cash in on the AI boom in an acceleratory way. Doing so undermines the very point of such a project, namely, to try to make AI go well. Doing so is incredibly damaging to trust within the community. Now let's suppose you're an x-risk funder considering whether to fund their previous org. This org does really high-quality work, but the argument for them being net-positive is now significantly weaker. This is quite likely to make finding future funding harder for them. This is less about attacking those three folks and more just noting that we need to strive to avoid situations where things like this happen in the first place. This requires us to be more careful in terms of who gets hired. There's been some discussions on the EA forum along the lines of "why do we care about value alignment shouldn't we just hire who can best do the job". My answer to that is that it's myopic to only consider what happens whilst they're working for you. Hiring someone or offering them an opportunity empowers them, you need to consider whether they're someone who you want to empower[1]. 1. ^ Admittedly, this isn't quite the same as value alignment. Suppose someone were diligent, honest, wise and responsible. You might want to empower them even if their views were extremely different from yours. Stronger: even if
14
huw
6h
0
Per Bloomberg, the Trump administration is considering restricting the equivalency determination for 501(c)3s as early as Tuesday. The equivalency determination allows for 501(c)3s to regrant money to foreign, non-tax-exempt organisations while maintaining tax-exempt status, so long as an attorney or tax practitioner claims the organisation is equivalent to a local tax-exempt one. I’m not an expert on this, but it sounds really bad. I guess it remains to be seen if they go through with it. Regardless, the administration is allegedly also preparing to directly strip environmental and political (i.e. groups he doesn’t like, not necessarily just any policy org) non-profits of their tax exempt status. In the past week, he’s also floated trying to rescind the tax exempt status of Harvard. From what I understand, such an Executive Order is illegal under U.S. law (to whatever extent that matters anymore), unless Trump instructs the State Department to designate them foreign terrorist organisations, at which point all their funds are frozen too. These are dark times. Stay safe 🖤
I'm not sure how to word this properly, and I'm uncertain about the best approach to this issue, but I feel it's important to get this take out there. Yesterday, Mechanize was announced, a startup focused on developing virtual work environments, benchmarks, and training data to fully automate the economy. The founders include Matthew Barnett, Tamay Besiroglu, and Ege Erdil, who are leaving (or have left) Epoch AI to start this company. I'm very concerned we might be witnessing another situation like Anthropic, where people with EA connections start a company that ultimately increases AI capabilities rather than safeguarding humanity's future. But this time, we have a real opportunity for impact before it's too late. I believe this project could potentially accelerate capabilities, increasing the odds of an existential catastrophe.  I've already reached out to the founders on X, but perhaps there are people more qualified than me who could speak with them about these concerns. In my tweets to them, I expressed worry about how this project could speed up AI development timelines, asked for a detailed write-up explaining why they believe this approach is net positive and low risk, and suggested an open debate on the EA Forum. While their vision of abundance sounds appealing, rushing toward it might increase the chance we never reach it due to misaligned systems. I personally don't have a lot of energy or capacity to work on this right now, nor do I think I have the required expertise, so I hope that others will pick up the slack. It's important we approach this constructively and avoid attacking the three founders personally. The goal should be productive dialogue, not confrontation. Does anyone have thoughts on how to productively engage with the Mechanize team? Or am I overreacting to what might actually be a beneficial project?
I just learned about Zipline, the world's largest autonomous drone delivery system, from YouTube tech reviewer Marques Brownlee's recent video, so I was surprised to see Zipline pop up in a GiveWell grant writeup of all places. I admittedly had the intuition that if you're optimising for cost-effectiveness as hard as GW do, and that your prior is as skeptical as theirs is, then the "coolness factor" would've been stripped clean off whatever interventions pass the bar, and Brownlee's demo both blew my mind with its coolness (he placed an order on mobile for a power bank and it arrived by air in thirty seconds flat, yeesh) and also seemed the complete opposite of cost-effective (caveating that I know nothing about drone delivery economics). Quoting their "in a nutshell" section: Okay, but what about cost-effectiveness? Their "main reservations" section says Is there any evidence of cost-effectiveness at all then? According to Zipline, yes — e.g. quoting the abstract from their own 2025 modelling study: That's super cost-effective. For context, the standard willingness-to-pay to avert a DALY is 1x per capita GDP or $2,100 in Ghana, so 35-50x higher. Also: (GW notes that they'd given Zipline's study a look and "were unable to quickly assess how key parameters like program costs and the impact of the program on vaccination uptake and disease were being estimated". Neither can I. Still pretty exciting)